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Abstract 

The  notion  of  mind  as  symbol  processor  is  fun¬ 
damental  to  Al  and  „wgnitive  science,  out.  Mime 
conneetionista  are  now  arguing  against  it.  Elim¬ 
inative  connectionism  challenges  the  validity  of 
formal  symbol  manipulation  as  a  level  of  mental 
description.  We  review  some  of  the  claims  that 
have  been  made,  and  argue  that  moat  connec¬ 
tion^  models,  especially  those  constructed  by 
learning  algorithms,  are  operating  at  the  level 
of  pattern  classifiers.  Their  (rather  limited)  suc¬ 
cess  using  non-symbolic  representations  demon¬ 
strates  that  they  have  not  yet  even  approached 
the  tasks  which  symbol  processing  models  at¬ 
tempt  to  solve.  Continued  progress  in  connec¬ 
tion^  research  may  require  reimplementation 
rather  than  rejection  of  the  symbolic  level. 


1.  Introduction 

The  first  thirty  years  of  AI  research  proceeded  on  the 
entirely  plausible  assumption  that  the  mind  was  a  sym¬ 
bol  processor.  This  idea  has  recently  been  challenged  by  a 
group  of  cognitive  scientists  known  as  conneetionista,  who 
construct  neural  network  models  which  they  claim  have  no 
equivalent  description  at  the  formal  symbolic  level.  Rumel- 
hart  and  McClelland  (1986a)  and  McClelland  and  Rui . 
hart  (1986)  provide  a  good  introduction  to  the  connects:  - 
ist  approach,  also  known  aa  “parallel  distributed  process 
ing." 

The  controversy  surrounding  connectionism  has  been 
heating  up  recently  in  response  to  three  important  pa¬ 
pers.  Smolensky  (in  preea)  provides  the  definitive  state¬ 
ment  of  the  hypotheses  underlying  connectionism.  Pinker 
and  Prince  (1987)  and  Fodor  and  Pylyshyn  (1987)  chal¬ 
lenge  the  connectionist  view,  both  by  attacking  general 
claims  that  others  have  made  about  these  networks  and 
by  criticising  particular  models  that  have  appeared  in  the 
literature.  A  review  of  the  current  status  of  connectionist 
symbol  processing  will  allow  the  interested  reader  to  follow 
the  debate  aa  it  unfolds. 

As  in  Pinker  and  Prince  (1987,  pp.  4-7),  we  divide 
connectionism  into  three  schools.  “Implementations!  con¬ 
nectionism”  is  concerned  with  how  massively  parallel  arch¬ 


itectures  might  implement  the  classical  notion  of  symbol 
processing.  Examples  include  Tcu: aUk,  ir.d  Hinto;.'»  u«u- 
ral  network  production  system  interpreter  (Touretzky  and 
Hinton,  1985)  and  Ballard’s  connectionist  implementation 
of  resolution  (Ballard,  1986).  “Eliminative  connectionism," 
on  the  other  hand,  denies  the  validity  of  symbolic-level  de¬ 
scriptions.  (Smolensky’s  paper  serves  as  a  sort  of  “Elim- 
inativist  Manifesto.”)  Eliminativists  believe  that  to  accu¬ 
rately  explain  what  goes  on  in  the  mind  one  must  shift  to 
a  hypothetical  sub-symbolic  level,  more  abstract  than  the 
neural  level,  but  also  fundamentally  different  from  (and 
not  merely  an  implementation  of)  symbolic-level  compu¬ 
tation.  Finally,  “revisionist-symbol-processing  connection¬ 
ism”  is  suggested  by  Pinker  and  Prince  as  a  middle  ground 
where  discoveries  might  lead  to  fundamental  changes  in 
our  understanding  of  symbol  processing  without  forcing  us 
to  abandon  the  classical  commitment  to  symbolic-level  de¬ 
scriptions. 

This  paper  is  primarily  concerned  with  the  eliminative 
position,  not  because  it  is  preferred,  but  because  as  the 
moat  radical  version  of  connectionism  it  is  most  at  odds 
with  the  classical  account  of  intelligence. 


The  symbolic-level  paradigm  underlies  most  research 
in  Al  and  cognitive  science.  Language,  commonsense  rea¬ 
soning,  and  conscious  problem  solving  can  all  be  described 
at  this  level  in  terms  of  structures,  composed  of  symbols, 
that  are  manipulated  by  formal  rules.  Parse  trees,  seman¬ 
tic  nets,  frames,  scripts,  and  axiom  sets  are  examples  of 
composite  symbol  structures.  Their  manipulation  can  be 
specified  in  various  ways,  some  of  which  are  computational, 
such  as  production  rules,  theorem  proven,  and  Lisp  func¬ 
tions,  and  othen  which  are  descriptive  but  not  necessarily 
computational,  such  as  the  rules  linguists  write  to  describe 
syntactic  or  phonological  processes. 

The  claim  made  by  the  symbolic-level  paradigm  is  that 
intelligent  behavior  can  be  adequately  explained  purely  in 
terms  of  formal  operations  on  symbol  structures.  Newell 
(1980a)  calls  this  the  Physical  Symbol  System  Hypothesis. 


2.  The  Symbolic-Level  Paradigm 


In  other  words,  the  mind  contains  symbol  structures  for 
concepts,  goals,  intentions,  memories,  and  so  forth,  and 
intelligence  derives  from  the  effective  manipulation  of  these 
structures.  Eliminative  connectionism  denies  this.  Before 
getting  into  what  the  connectionists  would  have  in  place  of 
symbols  and  rules,  I  should  emphasize  a  key  point  in  the 
definition  of  the  symbolic-level  paradigm.  It  isn’t  just  a 
claim  that  the  mind  works  by  manipulating  symbols;  it  is 
a  claim  that  the  structures  the  mind  manipulates  can  be 
directly  identified  with  the  elements  of  mental  life:  they 
are  words,  thoughts,  percepts;  not  arbitrary,  meaningless 
atoms. 

Consider  a  thermostat  with  setpoint  To  whose  behavior 
is  governed  by  the  following  rule: 

IF  T  <  To 

THEN  turn-on(furnace) 

ELSE  turn-off (furnace) 

This  rule  constitutes  a  symbolic-level  theory  of  ther¬ 
mostats.  It  is  expressed  in  terms  of  ambient  temperature, 
setpoint,  and  furnace  activity:  the  language  of  the  thermo¬ 
static  domain.  It  does  not  refer  to  the  individual  atoms 
that  make  up  the  thermostat,  or  to  the  motions  of  parti¬ 
cles  in  the  atmosphere  of  the  room.  It  is  a  formal  theory 
because  it  can  be  implemented,  and  it  accurately  predicts 
the  thermostat’s  behavior.  According  to  the  symbolic-level 
paradigm,  mental  processes  can  also  be  explained  by  formal 
theories,  without  reference  to  phenomena  such  as  neuron 
firings  that  exist  only  at  a  lower  level  of  description. 

3.  Are  There  Rules? 

If  there  are  explicit  representations  of  rules  in  the  head, 
there  must  be  an  interpreter  to  execute  the  rules  as  thinking 
proceeds.  Conscious  problem  solving  behavior  does  appear 
to  be  rule-based.  For  example,  John  Anderson’s  ACT* 
model,  which  learns  new  production  rules  as  it  gains  expe¬ 
rience  at  tasks  such  as  proving  geometry  theorems,  offers  a 
good  account  of  how  humans  behave  when  performing  the 
same  tasks  (Anderson,  1983;  Anderson,  in  press).  But  it  is 
important  to  distinguish  between  conscious,  deliberate  be¬ 
havior  and  intuitive  behavior.  The  latter  is  not  explainable 
by  introspection,  nor  is  it  decomposable  into  consciously- 
accessible  steps  such  as  occur  in  problem  solving. 

Intuitive-level  phenomena  certainly  include  such  things 
as  vision  and  motor  control,  which  operate  almost  entirely 
below  the  level  of  conscious  thought.  Language  and  com¬ 
mon  sense  reasoning  also  proceed  largely  at  the  subcon¬ 
scious  level,  and  appear  to  be  intuitive.  Smolensky  sug¬ 
gests  that  our  linguistic  facility  actually  serves  as  the  rule 
interpreter  for  conscious  problem  solving,  and  that  what 
we  perceive  as  consciousness  is  a  series  of  snapshots  of  the 
state  of  an  intuitive  processor  that  is  not  itself  rule-based. 
Note,  however,  that  rule-based  behavior  isn’t  necessarily 
conscious.  Newell  (1980b)  used  the  production  rule  for¬ 
malism  to  speculate  about  mental  implementations  of  the 
Harpy  speech  understanding  system. 

In  linguistics,  the  goal  has  been  to  explain  phenomena 


by  deriving  the  most  economical  set  of  rules  that  account 
for  the  data.  Linguists  shy  away  from  claiming  that  these 
formal  rules,  with  their  associated  interpreter,  are  what 
is  actually  in  the  head  (Stabler  1983;  Thompson,  1983). 
However,  models  of  linguistic  development  (as  opposed  to 
competence)  are  often  phrased  in  terms  of  rule  acquisition 
and  revision,  which  may  require  an  explicit  representation 
for  rules. 

The  symbolic-level  paradigm  is  about  the  description 
of  behavior  in  terms  of  symbols  and  rules;  it  says  nothing 
about  the  explicit  representation  of  rules.  In  the  case  of  the 
thermostat,  which  clearly  has  no  rule  interpreter  inside  it, 
the  physical  structure  of  the  device  induces  certain  causal 
relationships  between  the  ambient  temperature,  setpoint, 
and  furnace  activity  which  are  accurately  summarized  by 
the  rule  we  gave  previously.  That  is  all  the  symbolic-level 
paradigm  requires. 

4.  Symbolic- Level  Connectionism 

Some  conneetionist  models  identify  symbols  with  par¬ 
ticular  units.  These  are  known  as  localist  models,  to  dis¬ 
tinguish  them  from  the  distributed  models  that  are  the 
focus  of  this  paper.  Cottrell  (1985)  and  Waltz  and  Pol¬ 
lack  (1985)  use  a  localist  representation  in  which  individual 
units  stand  for  words  or  word  senses;  Shastri  (1985)  uses 
units  to  denote  classes  and  properties  in  an  inheritance 
hierarchy;  Selman  (1985)  and  Fanty  (1986)  use  units  to  de¬ 
note  input  atoms  and  grammatical  tokens  in  networks  that 
parse  context-free  languages.  Localist  models  may  have  in¬ 
teresting  dynamical  properties,  e.g.,  when  units  denoting 
competing  word  senses  inhbit  each  other,  they  are  in  effect 
“fighting"  to  settle  on  the  most  plausible  interpretation  of 
the  input.  In  Pollack's  model  of  the  garden  path  sentence 
“The  astronomer  married  the  star,"  the  ASTRONOMER  unit 
causes  the  HEAVENLY-BODY  unit  to  have  a  higher  initial 
activation  than  MOVIE-STAR,  but  HEAVENLY- BODY  even¬ 
tually  loses  out  due  to  constraints  imposed  by  MARRIED, 
much  as  humans  revise  their  initial  interpretation  the  first 
time  they  hear  the  sentence. 

Fodor  and  Pylyshyn’s  criticsm  of  the  localist  approach 
focuses  on  the  inability  to  compose  symbols  when  they  are 
tied  directly  to  processing  units.  They  point  out  that  al¬ 
though  one  can  designate  individual  units  to  stand  for  P, 
Q,  and  P&Q,  the  fact  that  P&Q  references  P  cannot  be  ex¬ 
pressed.  An  excitatory  connection  from  P&Q  to  P  would 
allow  the  network  to  infer  P  whenever  P4cQ  is  asserted  to 
be  true.  But  the  network  cannot  decompose  P&Q  to  get  P, 
nor  can  it  compose  new  structures  such  as  P&Q&R  from 
already  existing  ones.  But  Fodor  and  Pylyshyn  go  too  far 
when  they  drum  distributed  models  suffer  the  same  diffi¬ 
culty;  Touretzky  (1986)  and  Touretzky  and  Geva  (1987) 
provide  counterexamples. 

Most  connectionists  view  the  localist  approach  as  a 
temporary  compromise  that  allows  them  to  conveniently 
explore  certain  dynamic  constraint  satisfaction  phenom¬ 
ena.  When  the  full  power  of  the  classical  symbol  processing 
model  has  been  implemented  in  a  distributed  connection- 


1st  architecture,  the  localist  approach  may  no  longer  be 
attractive. 

5.  The  Sub-symbolic  Paradigm 

In  distributed  coonectionist  models,  symbols  and  sym¬ 
bol  structures  are  represented  by  patterns  of  activity  over 
a  collection  of  units,  rather  than  by  individual  units.  Sym¬ 
bols  are  then  points  in  a  high-dimensional  metric  space, 
with  a  natural  similarity  measure  being  the  dot  product. 
Although,  as  Fodor  and  Pylyshyn  note  (p.  58),  one  may 
impose  arbitary  similarity  measures  on  conventional  sym¬ 
bol  systems,  in  the  connections  case  the  similarity  effects 
are  rooted  in  the  causal  structure  of  the  model. 

A  large  class  of  distributed  connectionist  models  are 
concerned  with  pattern  classification  or  pattern  transfor¬ 
mation.  For  example,  Sejnowski  and  Roeenberg’s  cele¬ 
brated  NETtalk  model  maps  input  patterns  that  represent 
a  seven  letter  window  of  text  to  output  patterns  that  repre¬ 
sent  a  phoneme  (Sejnowski  and  Rosenberg,  1987).  Rumel- 
hart  and  McClelland’s  verb  learning  model  maps  phonemic 
representations  of  present  tense  verbs  to  phonemic  repre¬ 
sentations  of  the  corresponding  past  tense,  e.g.,  “hug”  to 
“hugged"  and“go”  to  “went”  (Rumelhart  and  McClelland, 
1986b).  The  weights  in  both  of  these  models  are  derived  by 
connectionist  learning  procedures  from  repetitive  exposure 
to  example  inputs.  Rumelhart  and  McClelland  used  a  ver¬ 
sion  of  the  perceptron  learning  algorithm,  while  Sejnowski 
and  Rosenberg  used  the  more  recent  back  propagation  al¬ 
gorithm  of  Rumelhart,  Hinton,  and  Williams  (1986). 

Another  large  class  of  models  perform  constraint  sat¬ 
isfaction  by  relaxation,  such  as  Hopfield  nets  (Hopfield, 
1982),  the  Boltzmann  machine  (Hinton  and  Sejnowski,  1986), 
and  harmony  theory  (Smolensky,  1986).  Boltzmann  ma¬ 
chines  and  harmony  theory  are  stochastic  models  that  relax 
by  simulated  annealing,  in  analogy  with  statistical  mechan¬ 
ics.  These  networks  also  have  learning  algorithms. 

An  observation  connectionists  are  fond  of  making  is 
that  there  are  no  explicit  rules  in  distributed  models:  all 
the  knowledge  is  in  the  connection  strengths.  Since  indi¬ 
vidual  units  are  not  meaningful  as  symbols  (only  activity 
patterns  taken  as  a  whole  are  meaningful),  the  connections 
between  units  cannot  be  regarded  as  symbolic-level  rules, 
and  the  connectionist  model’s  behavior  is  not  rule-based 
(Smolensky,  in  press;  Derthick  and  Plaut,  1966). 

The  natural  (but  incorrect)  counter  to  this  argument  is 
that  it  could  be  made  about  any  symbol  manipulation  sys¬ 
tem  if  we  choose  too  low  a  level  of  description,  e.g.,  describ¬ 
ing  a  digital  computer  by  the  behaviors  of  individual  tran¬ 
sistors.  The  flaw  in  this  reasoning  is  that  the  computer’s 
circuitry  is  constrained  a  priori  to  implement  a  logically- 
designed  instruction  set.  Therefore  one  can  abstract  away 
from  the  transistor  level  to  an  instruction-level  of  descrip¬ 
tion  without  loss  of  information  about  the  computational 
behavior  of  the  machine.  In  contrast,  distributed  connec¬ 
tionist  models  are  not  constrained  to  implement  machines 
with  symbolic-level  descriptions.  They  are  typically  con¬ 
structed  by  learning  procedures  whose  only  goal  is  to  min¬ 


imize  an  error  measure  by  modifying  connection  strengths. 
Connectionists  threfore  claim  that  since  the  learning  pro¬ 
cedure  is  not  required  (or  even  trying)  to  implement  a 
machine  that  possesses  a  symbolic-level  description,  it  is 
unlikely  that  the  networks  they  construct  will  have  such 
descriptions.  To  the  extent  that  these  networks  exhibit  in¬ 
telligent  behavior,  their  intelligence  is  at  the  subsymbolic 
level,  not  at  the  level  of  formal  operations  on  symbol  struc¬ 
tures;  the  latter  is  at  best  an  approximate  description  of 
the  computation  taking  place. 

In  the  remainder  of  this  paper  I  will  argue  against  the 
connectionist  position,  beginning  with  a  reexamination  of 
the  symbolic-level  theory  of  thermostats. 

6.  Beyond  Pattern  Transformation 

Consider  a  graph  of  ambient  temperature  vs.  setpoint. 
We  can  draw  a  line  with  slope  1  that  divides  the  graHh  into 
two  regions,  one  labeled  “furnace  on,"  the  other  “furnace 
off."  Given  any  point  specified  by  T  and  To,  we  can  predict 
from  the  region  the  point  falls  in  whether  the  thermostat 
will  tum  the  furnace  on  or  off.  Furthermore,  using  back 
propagation  we  can  train  a  one-unit  connectionist  network 
to  do  this,  and  it  will  automatically  “generalize”  to  points 
not  in  the  training  set. 

This  example  demonstrates  that  the  thermostat  can 
be  faithfully  modeled  as  a  one-neuron  linear  discriminator 
rather  than  as  a  symbol  processing  device.  More  demand¬ 
ing  discriminations,  involving  multiple  regions  with  com¬ 
plex  shapes,  would  require  more  units  and  several  process¬ 
ing  layers,  but  they  are  not  fundamentally  different  from 
this  example.  Connectionist  learning  schemes  are  appar¬ 
ently  quite  good  at  learning  to  make  pattern  discrimina¬ 
tions;  they  are  often  better  than  previously-known  pattern 
recognition  techniques.  Furthermore,  connectionist  models 
can  learn  to  transform  patterns  rather  than  merely  classify 
them;  under  certain  conditions  a  properly-trained  network 
can  transform  a  novel  pattern  into  another  novel  pattern, 
once  it’s  learned  the  “rule”  for  doing  so.  But  this  has  little 
to  do  with  symbol  processing. 

If  we  examine  the  connectionist  models  that  have  been 
held  up  as  evidence  against  the  symbolic  paradigm,  we  see 
that  rather  than  attacking  the  symbol  manipulation  prob¬ 
lem  head  on  to  demonstrate  the  illusory  nature  of  symbol 
processing,  they  have  instead  been  trivializing  complex  be¬ 
haviors  to  get  simple  tasks  that  can  be  solved  by  pattern 
transformation  or  just  pattern  recognition.  The  (rather 
limited)  success  of  these  models  merely  proves  that  sym¬ 
bol  processing  isn’t  required  for  such  tasks,  just  as  it  isn’t 
required  to  implement  a  thermostat. 

There  are  several  reasons  why  intelligent  behavior  should 
not  be  dismissed  as  simply  a  pattern  transformation  prob¬ 
lem.  First,  as  Fodor  and  Pylyshyn  point  out,  language  and 
thought  have  a  highly  combinatorial,  compositional  struc¬ 
ture.  Whether  or  not  such  structure  is  reflected  at  some 
hypothetical  sub-symbolic  level,  connectionist  models  must 
at  least  behave  as  if  they  had  such  structures  inside  them. 
Pattern  transformation  systems  do  not  meet  this  criterion 


unless  they  are  trained  on  practically  every  structure  they 
will  ever  encounter,  as  in  (Allen,  1987). 

Second,  the  notion  of  “variables”  is  essential  to  intel¬ 
ligent  behavior,  as  it  permits  the  manipulation  of  struc¬ 
tures  that  were  not  specified  in  advance.  One  example  is 
filling  in  the  participants  in  a  script.  If  we  see  John  go 
into  a  restaurant  where  Mary  works,  we  know  that  it  is 
Mary  who  will  bring  the  menu  and  John  who  will  read  it. 
This  is  not  conscious  problem  solving,  it's  the  sort  of  com¬ 
mon  sense  reasoning  that  takes  place  at  the  intuitive  level. 
The  restaurant  script  includes  the  variables  Customer  and 
Waitress,  and  our  intuitive  comprehension  of  the  situation 
leads  us  to  conclude  that  John  and  Mary  play  those  re¬ 
spective  roles;  this  in  turn  allows  John  and  Mary  to  be 
instantiated  elsewhere  in  the  script  to  predict  things  like 
who  reads  the  menu.  Another  example,  due  to  Pinker  and 
Prince,  is  morphological  reduplication,  which  copies  an  en¬ 
tire  stem  as  a  unit,  yielding  forms  such  as  “dum-dum"  or 
“boom-boom”.  The  variable,  in  this  case,  references  the 
stem  to  be  copied. 

Third,  as  Drew  McDermott  notes  in  (McClelland  et  al., 
1986),  the  ability  to  contrast  one  thing  with  another  (as  in 
“She  is  more  outgoing  with  her  friends  than  with  me,  her 
advisor”)  is  an  important  part  of  reasoning.  McDermott 
calls  this  property  “thirdness".  How  can  a  connectionist 
model  formulate  such  propositions  without  doing  symbol 
manipulation? 

7.  Conclusion 

The  nature  of  thought  and  language  would  appear  suf¬ 
ficient  to  impose  a  symbol-processing  level  on  connectionist 
networks,  provided  we  choose  a  task  rich  enough  to  require 
this  level  rather  than  one  so  simple  it  can  be  solved  by 
pattern  transformation  alone. 

On  the  other  hand,  if  the  eliminative  hypothesis  is  cor¬ 
rect,  connectionista  should  be  able  to  point  to  mental  phe¬ 
nomena  that  cannot  be  explained  by  symbol  processing 
models,  but  are  explained  by  connectionist  ones.  They  are 
nowhere  near  the  point  where  they  can  do  this  convinc¬ 
ingly.  Connectionist  models  are  too  primitive  to  reproduce 
even  basic  symbol  processing  behavior,  and  the  space  of 
symbolic-level  models  is  huge,  making  it  difficult  to  prove 
that  no  such  model  could  ever  account  for  a  particular  be¬ 
havior. 

Even  if  the  eliminative  hypothesis  is  disproved,  leaving 
connectionism  an  implementation  technique  instead  of  an 
alternative  paradigm  for  cognition,  the  distributed  connec¬ 
tionist  approach  promises  to  be  a  source  of  many  valuable 
insights.  For  example,  Derthick’s  Micro-KLONE,  a  con¬ 
nectionist  version  of  KL-ONE  (Brachman  and  Schmolze, 
1985),  shows  how  counterfactual  reasoning  by  constructing 
plausible  models  can  be  formulated  as  a  massively  paral¬ 
lel  constraint  satisfaction  problem  involving  thousands  of 
sub-symbolic  micro-inferences  (Derthick,  1987).  This  ap¬ 
proach  to  reasoning  is  a  distinct  departure  from  traditional 
AI  methods.  The  parallel  application  of  knowledge  is  a 
serious  problem  that  AI  has  iargely  failed  to  address. 


In  conclusion,  connectionist  models  are  too  new  for  us 
to  determine  the  validity  cf  the  eliminative  hypothesis.  But 
if  this  revolutionary  idea  is  to  have  any  chance  of  success, 
connectionists  must  first  construct  more  complex  models 
that  go  beyond  simple  pattern  transformation  and  relax¬ 
ation.  They  must  at  least  approximate  the  powerful  lin¬ 
guistic  and  inferential  abilities  human  beings  are  known  to 
possess. 
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